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Abstract As of today, there exists no standard language for querying Linked 
Data cm the Web, where navigation across distributed data sources is a key feature. 

A natural candidate seems to be SPARQL, which recently has been enhanced 
with navigational capabilities thanks to the introduction of property paths (PPs). 
However, the semantics of SPARQL restricts the scope of navigation via PPs to 
single RDF graphs. This restriction limits the applicability of PPs on the Web. To 
fill this gap, in this paper we provide formal foundations for evaluating PPs on 
the Web, thus contributing to the definition of a query language for Linked Data. 

In particular, we introduce a query semantics for PPs that couples navigation at 
the data level with navigation on the Web graph. Given this semantics we find 
that for some PP-based SPARQL queries a complete evaluation on the Web is not 
feasible. To enable systems to identify queries that can be evaluated completely, 
we establish a decidable syntactic property of such queries. 

1 Introduction 

The increasing trend in sharing and interlinking pieces of structured data on the World 
Wide Web (WWW) is evolving the classical Web—which is focused on hypertext doc¬ 
uments and syntactic links among them—into a Web of Linked Data. The Linked Data 
principles [4] present an approach to extend the scope of Uniform Resource Identi¬ 
fiers (URIs) to new types of resources (e.g., people, places) and represent their descrip¬ 
tions and interlinks by using the Resource Description Framework (RDF) [16] as stan¬ 
dard data format. RDF adopts a graph-based data model, which can be queried upon by 
using the SPARQL query language [12]. When it comes to Linked Data on the WWW, 
the common way to provide query-based access is via SPARQL endpoints, that is, ser¬ 
vices that usually answer SPARQL queries over a single dataset. Recently, the original 
core of SPARQL has been extended with features supporting query federation; it is now 
possible, within a single query, to target multiple endpoints (via the service operator). 
However, such an extension is not enough to cope with an unbounded and a priori un¬ 
known space of data sources such as the WWW. Moreover, not all Linked Data on the 
WWW is accessible via SPARQL endpoints. Hence, as of today, there exists no standard 
query language for Linked Data on the WWW, although SPARQL is clearly a candidate. 

While earlier research on using SPARQL for Linked Data is limited to fragments of 
the first version of the language [5,13,14,25], the more recent version 1.1 introduces a 
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feature that is particularly interesting in the context of queries over a graph-like environ¬ 
ment such as Linked Data on the WWW. This feature is called property paths (PPs) and 
equips SPARQL with navigational capabilities [12]. However, the standard definition 
of PPs is limited to single, centralized RDF graphs and, thus, not directly applicable to 
Linked Data that is distributed over the WWW. Therefore, toward the definition of a 
language for accessing Linked Data live on the WWW, the following questions emerge 
naturally: "How can PPs be defined over the WWW?” and “What are the implications 
of such a definition?” Answering these questions is the broad objective of this paper. 
To this end, we make the following main contributions: 

1. We formalize a query semantics for PP-based SPARQL queries that are meant to 
be evaluated over Linked Data on the WWW. This semantics is context-based', it 
intertwines Web graph navigation with navigation at the level of data. 

2. We study the feasibility of evaluating queries under this semantics. We assume that 
query engines do not have complete information about the queried Web of Linked 
Data (as it is the case for the WWW). Our study shows that there exist cases in 
which query evaluation under the context-based semantics is not feasible. 

3. We provide a decidable syntactic property of queries for which an evaluation under 
the context-based semantics is feasible. 

The remainder of the paper is organized as follows. Section 2 provides an overview 
on related work. Section 3 introduces the formal framework for this paper, including 
a data model that captures a notion of Linked Data. In Section 4 we focus on PPs, 
independently from other SPARQL operators. In Section 5 we broaden our view to 
study PP-based SPARQL graph patterns; we characterize a class of Web-safe patterns 
and prove their feasibility. Finally, in Section 6 we conclude and sketch future work. 

2 Related Work 

The idea of querying the WWW as a database is not new (see Florescu et al.’s sur¬ 
vey [11]). Perhaps the most notable early works in this context are by Konopnicki and 
Shmueli [18], Abiteboul and Vianu [1], and Mendelzon et al. [20], all of which tack¬ 
led the problem of evaluating SQL-like queries on the traditional hypertext Web. While 
such queries included navigational features, the focus was on retrieving specific Web 
pages, particular attributes of specific pages, or content within them. 

From a graph-oriented perspective, languages for the navigation and specification 
of vertices in graphs have a long tradition (see Wood’s survey [26]). In the RDF world, 
extensions of SPARQL such as PSPARQL [2], nSPARQL [21], and SPARQLeR [17] in¬ 
troduced navigational features since those were missing in the first version of SPARQL. 
Only recently, with the addition of property paths (PPs) in version 1.1 [12], SPARQL 
has been enhanced officially with such features. The final definition of PPs has been 
influenced by research that studied the computational complexity of an early draft ver¬ 
sion of PPs [3,19], and there also already exists a proposal to extend PPs with more 
expressive power [9]. However, the main assumption of all these navigational exten¬ 
sions of SPARQL is to work on a single, centralized RDF graph. Our departure point is 
different: We aim at defining semantics of SPARQL queries (including property paths) 


over Linked Data on the WWVK which involves dealing with two graphs of different 
types; namely, an RDF graph that is distributed over documents on the WWW and the 
Web graph of how these documents are interlinked with each other. 

To express queries over Linked Data on the WWW, two main strands of research 
can be identified. The first studies how to extend the scope of SPARQL queries to the 
WWW, with existing work focusing on basic graph patterns [5,13,25] or a more expres¬ 
sive fragment that includes and, opt, union and filter [14]. The second strand focuses 
on navigational languages such as NautiLOD [8,10]. These two strands have different 
departure points. The former employs navigation over the WWW to collect data for an¬ 
swering a given SPARQL query; here navigation is a means to discover query-relevant 
data. The latter provides explicit navigational features and uses querying capabilities 
to filter data sources of interest; here navigation (not querying) is the main focus. The 
context-based query semantics proposed in this paper combines both approaches. We 
believe that the outcome of this research can be a starting point toward the definition of 
a language for querying and navigating over Linked Data on the WWW. 

3 Formal Framework 

This section provides a formal framework for studying semantics of PPs over Linked 
Data. We first recall the definition of PPs as per the SPARQL standard [12]. Thereafter, 
we introduce a data model that captures the notion of Linked Data on the WWW. 

3.1 Preliminaries 

Assume four pairwise disjoint, countably infinite sets I (IRIs), B (blank nodes), C (lit¬ 
erals), and V (variables). An RDF triple (or simply triple) is a tuple from the set 
T = (lUS)xIx(lUSU C). For any triple t G T we write iris(t) to denote 
the set of IRIs in that triple. A set of triples is called an RDF graph. 

A property path pattern (or PP pattern for short) is a tuple P = {a, path, j3) such 
that a, P G {X\J C\JV) and path is a property path expression {PP expression) defined 
by the following grammar (where u,ui,... ,Un G T): 

path =u I !(mi I ... \ un) I ^ath | path/path | (path | path) | (path)* 

Note that the SPARQL standard introduces additional types of PP expressions [12]. 
Since these are merely syntactic sugar (they are defined in terms of expressions covered 
by the grammar given above), we ignore them in this paper. As another slight devia¬ 
tion from the standard, we do not permit blank nodes in PP patterns (i.e., a,/3 ^ B). 
However, standard PP patterns with blank nodes can be simulated using fresh variables. 

Example 1. An example of a PP pattern is (Tim, (knows)*/name, In), which retrieves 
the names of persons that can be reached from Tim by an arbitrarily long path of 
knows relationships (which includes Tim). Another example are the two PP patterns 
(?p, knows, Tim) and (Tim, \nows, ?p), both of which retrieve persons that know Tim. 

The (standard) query semantics of PP patterns is defined by an evaluation function that 
returns multisets of solution mappings where a solution mapping /r is a partial function 


Function ALP 1 ( 7 , path, G) Function ALP2 ( 7 , path, Visited, G) 

Input: 7 G {Xu BU £), Input: 7 € {Tu BU C), path is a PP expression, 

path is a PP expression, Visited C {XUBU £), G is an RDF graph. 

G is an RDF graph. 4 ; if ^ Visited then 
1: Visited :=% 5: diAdto Visited 

2: kLP2{'y, path, Visited, G) 6 : for all /r € |(?x, path, ??/)]g s.t. /i(?x) = 7 do 

3: return Visited 7: kLP2{fi{?y),path, Visited,G) lllx,ly £V 

Figure 1. Auxiliary functions for defining the semantics of PP expressions of the form path*. 


H : V —>■ {XUBU£). Given a solution mapping /i and a PP pattern P, we write /i[P] to 
denote the PP pattern obtained by replacing the variables in P according to ft (unbound 
variables must not be replaced). Two solution mappings, say fii and fi 2 , are compatible, 
denoted by/Ti ^ /r2, if = M 2 (?f) for all variables ?r; G (dom(/ri) ndom(/r 2 )). 

We represent a multiset of solution mappings by a pair M = (17, card) where 17 is 
the underlying set (of solution mappings) and card : 17 {1,2,...} is the correspond¬ 

ing cardinality function. By abusing notation slightly, we write p, G M for all /i € 17. 
Furthermore, we introduce a family of special (parameterized) cardinality functions that 
shall simplify the definition of any multiset whose solution mappings all have a cardi¬ 
nality of 1. That is, for any set of solution mappings 17, let cardl*^^^: 17—(1,2,...} be 
the constant-1 cardinality function that is defined by cardl*'^^(p) = 1 for all p, G 17. 

To define the aforementioned evaluation function we also need to introduce several 
SPARQL algebra operators. Let Ml = (l7i, cardf) andM 2 = ( 172 , card 2 ) be multisets 
of solution mappings and let 1/ C V be a finite set of variables. Then: 

Ml LI M 2 = (17, card) where 17 = l7i U 172 and (i) card{p) = card\{p) for all so¬ 
lution mappings p G 17 \ Q 2 , (ii) card{p) = card 2 {p) for all p G 17 \ l7i, and 
(iii) card{p) = cardi{p) -L card 2 {p) for all p G l7i fl 172 . 

Ml ixi M 2 = (17, card) where 17 = { piUp 2 | (pi,p 2 ) G I7ix 172 and pi ~ P 2 } and, 
for every p G 17, card{p) = card{pi) ■ card{p 2 ). 

Ml \ M 2 = (17, card) where 17 = { pi G l7i | pi 9 ^ p 2 for all p 2 G 172 } and, for 
every p G fl, card{p) = cardi{p). 

nv{Mi) = {fl, card) where 17 = {p | 3p'Gl7i: p~p' and dom(p) = lLndom(p')} 
and, for every p G fl, card{p) = cardi{p'). 

In addition to these algebra operators, the SPARQL standard introduces auxiliary func¬ 
tions to define the semantics of PP patterns of the form (a, path* /3). Figure 1 provides 
these functions—which we call ALPl and ALP2—adapted to our formalism.^ 

We are now ready to define the standard query semantics of PP patterns. 

Definition 1. The evaluation of a PP pattern P over an RDF graph G, denoted by 
|P]g, is a multiset of solution mappings (17, card) that is defined recursively as given 
in Figure! where ct, G (lUCUV), Xl, xr G (IU£), ?ul, ?'yR G V, u, ui ,..., Un G T, 
Iv G V is afresh variable, and p 0 denotes the empty solution mapping (dom(p 0 ) =0). 


^ Variable 7x in line 6 is necessary since PP patterns in our formalism do not have blank nodes. 







I(a. /3)1 g ^ 

|<a, !(ui I ... I «„), P)}g -■ 

|(a. ''path, /3 >]g ^ 
|(Q:,pathi/path2, /3>]g ^ 
I(a, (pathil patha), /3>]g ^ 
|(a:L. (path)*, ?«r>]g ^ 
I(?«L, (path)*, ?«r)]g : 

I(?«L. (path)*, a:R)]G ^ 
[(xL, (path)*, a:R)]G ^ 


■ /i I dom(^) — ({a, /3} n V) and ^[(a,u,/3)] G G} , cardl^^^^ 

: ^ { /X I dom(^i) = ({a,/3}nV) and 

3/x[(a,?x,/3)] £ G : u G (X\{?xi,..., Un ,}) } , cardl^^^ ^ 

: K^.Path, a)]G 

^'^{Q,3}nv(l(“iPathi, ?i;)]g tx [(?«. Patha,/3 >]g) 

: I(“.Pathi,/3)]]G U |(a,paths,/3 )]g 

: ^ { /i I dom(^) — {?x^r} and /x(?x>r) £ ALPl(a:L, path, G) } , cardl^^^^ 

: ^ { /X I dom(^) — {?x?L, ?x;r} and /x(?x;l) G terms(G) and 
/x(?'aR) £ ALPl(/x(?nL), path, G) } , cardl^^^^ 

: [(xR, (''path)*, ?iil)1g 

/ {^0} if 3 /X e [(xL, (path)*, ?x)]g : = xr. 


<1 


else 


, cardl'^’^) 


Figure 2. SPARQL 1.1 W3C property paths semantics. 


3.2 Data Model 

The standard SPARQL evaluation function for PP patterns (cf. Section 3.1) defines the 
expected result of the evaluation of a pattern over a single RDF graph. Since the WWW 
is not an RDF graph, the standard definition is insufficient as a formal foundation for 
evaluating PP patterns over Linked Data on the WWW. To provide a suitable defini¬ 
tion we need a data model that captures the notion of a Web of Linked Data. To this 
end, we adopt the data model proposed in our earlier work [14]. Here, a Web of Linked 
Data (WoLD) is a tuple W = {D,data,adoc) consisting of (i) a set D of so called 
Linked Data documents (documents), (ii) a mapping data : I? —> 2^ that maps each 
document to a finite set of RDF triples (representing the data that can be obtained from 
the document), and (iii) a partial mapping adoc : X —>■ D that maps (some) IRIs to a 
document and, thus, captures a IRJ-based retrieval of documents. In this paper we as¬ 
sume that the set of documents D in any WoLD W = {D, data, adoc) is finite, in which 
case we say W is finite (for a discussion of infiniteness refer to our earlier work [14]). 

A few other concepts are needed for the subsequent discussion. For any two docu¬ 
ments d,d' G D in a WoLD W = {D, data, adoc), document d has a data link to d' 
if the data of d mentions an IRI u G I (i.e., there exists a triple {s,p, o) G data{d) 
with u G {s,p, o}) that can be used to retrieve d' (i.e., adoc{u) = d'). Such data links 
establish the link graph of the WoLD W, that is, a directed graph (D, E) in which the 
edges E are all pairs {d, d') G D x D for which d has a data link to d'. Note that this 
graph, as well as the tuple {D, data, adoc) typically are not available directly to systems 
that aim to compute queries over the Web captured by W. For instance, the complete 
domain of the partial mapping adoc (i.e., all IRIs that can be used to retrieve some doc¬ 
ument) is unknown to such systems and can only be disclosed partially (by trying to 
look up IRIs). Also note that the link graph of a WoLD is a different type of graph than 
the RDF “graph” whose triples are distributed over the documents in the WoLD. 


4 Web-aware Query Semantics for Property Paths 

We are now ready to introduce our framework, which does not deal with syntactic as¬ 
pects of PPs but aims at defining query semantics that provide a formal foundation for 
using PP patterns as queries over a WoLD (and, thus, over Linked Data on the WWW). 

4.1 Full-Web Query Semantics 

As a first approach we may assume a full-Web query semantics that is based on the 
standard evaluation function (as introduced in Section 3.1) and defines an expected 
query result for any PP pattern in terms of all data on the queried WoLD. Formally: 

Definition 2. Let P be a PP pattern, let W = {D, data, adoc) be a WoLD, and let G* 
be an RDF graph such that G* = UdGD <itita{d), then the evaluation of P over W 
Mnc/er full-Web semantics, denoted by |P]|^, is defined by |P]^ = [F’Ig*- 

We emphasize that the full-Web query semantics is mostly of theoretical interest. In 
practice, that is, for a WoLD W that represents the “real” WWW (as it runs on the 
Internet), there cannot exist a system that guarantees to compute the given evaluation 
function over W using an algorithm that both terminates and returns complete 
query results. In earlier work, we showed such a limitation for evaluating other types 
of SPARQL graph patterns—including triple patterns—under a corresponding full-Web 
query semantics defined for these patterns [14]. This result readily carries over to the 
full-Web query semantics for PP patterns because any PP pattern P = (a, path,/?) 
with PP expression path being an IRI m € X is, in fact, a triple pattern {a, u, ff). Infor¬ 
mally, we explain this negative result by the fact that the three structures D, data, and 
adoc that capture the queried Web formally, are not available in practice. Consequently, 
to enumerate the set of all triples on the Web (i.e., the RDF graph G* in Definition 2), a 
query execution system would have to enumerate all documents (the set D); given that 
such a system has limited access to mapping adoc (in particular, dom(adoc)—the set 
of all IRIs whose lookup retrieves a document—is, at best, partially known), the only 
guarantee to discover all documents is to look up any possible (HTTP-scheme) IRI. 
Since these are infinitely many [7], the enumeration process cannot terminate. 

4.2 Context-Based Query Semantics 

Given the limited practical applicability of full-Web query semantics for PPs, we pro¬ 
pose an alternative query semantics that interprets PP patterns as a language for naviga¬ 
tion over Linked Data on the Web (i.e., along the lines of earlier navigational languages 
for Linked Data such as NautiLOD [8]). We refer to this semantics as context-based. 

The main idea behind this query semantics is to restrict the scope of searching for 
any next triple of a potentially matching path to specific data within specific documents 
on the queried WoLD. As a basis for formalizing these restrictions we introduce the 
notion of a context selector. Informally, for each IRI that can be used to retrieve a 
document, the context selector returns a specific subset of the data within that document; 
this subset contains only those RDF triples that have the given IRI as their subject (such 


a set of triples resembles Harth and Speiser’s notion of subject authoritative triples [13]). 
Formally, for any WoLD W = {D, data, adoc), the context selector of VF is a function 
C^: that, for each 7 G (XUSU£U V), is defined as follows:^ 


C^(7) 


{ {s,p, 0 ) € data(^adoc{-f)) | 7 = s} if 7 G X and 7 G dom(adoc), 
0 otherwise. 


Informally, we explain how a context selector restricts the scope of PP patterns over a 
WoLD as follows. Suppose a sequence of triples {si,pi,oi), ..., {sk,Pk,Ok) presents 
a path that already matches a sub-expression of a given PP expression. Under the pre¬ 
viously defined full-Web query semantics (cf. Section 4.1), the next triple for such a 
path can be searched for in an arbitrary document in the queried WoLD W. By contrast, 
under the context-based query semantics, the next triple has to be searched for only 
in C^(ofc). Given these preliminaries, we now define context-based semantics: 

Definition 3. Let P be a PP pattern and let W = {D, data, adoc) be a WoLD. The 
evaluation of P over W under context-based semantics, denoted by returns a 

multiset of solution mappings (17, card) defined recursively as given in Figure 3, where 
u,Un G X; XL, xr G (Xu C); ?ul, ?t.’R G V; /i 0 is the empty solution mapping (i.e., 
domj/rg) = ih); function ALPWl is given in Figure 4; and Iv GV is afresh variable. 

There are three points worth mentioning w.r.t. Definition 3: First, note how the context 
selector restricts the data that has to be searched to find matching triples (e.g., consider 
the first line in Figure 3). Second, we emphasize that context-based query semantics is 
defined such that it resembles the standard semantics of PP patterns as close as possi¬ 
ble (cf. Section 3.1). Therefore, for the part of our definition that covers PP patterns of 
the form (a, path*/3), we also use auxiliary functions—ALPWl and ALPW2 (cf. Fig¬ 
ure 4). These functions evaluate the sub-expression path recursively over the queried 
WoLD (instead of using a fixed RDF graph as done in the standard semantics in Fig¬ 
ure 1). Third, the two base cases with a variable in the subject position (i.e., the third 
and the sixth line in Figure 3) require an enumeration of all IRIs. Such a requirement 
is necessary to preserve consistency with the standard semantics, as well as to pre¬ 
serve commutativity of operators that can be defined on top of PP patterns (such as the 
AND operator in SPARQL; cf. Section 5). However, due to this requirement there exist 
PP patterns whose (complete) evaluation under context-based semantics is infeasible 
when querying the WWW. The following example describes such a case. 

Example 2. Consider the PP pattern P^2 = knows, Tim), which asks for the IRIs 
of people that know Tim. Under context-based semantics, any IRI u' can be used to 
generate a correct solution mapping for the pattern as long as a lookup of that IRI 
results in retrieving a document whose data includes the triple {u', knows, Tim). While, 
for any WoLD that is finite, there exists only a finite number of such IRIs, determining 
these IRIs and guaranteeing completeness requires to enumerate the infinite set of all 
IRIs and to check each of them (unless one knows the complete—and finite—subset of 

To simplify the following formalization of context-based semantics, context selectors are de¬ 
fined not only over IRIs, but also over blank nodes, literals, and variables. 



KwL.p.wrv^- 

Wi.,p,mw 


[(UL-K^il I ■■• I Mn)./3)lw’‘ 


I{iL.!(«l I ■■• I Mn)./3>lw* 
I(?«l.!(«i I ■■• I u^),mw 


[(a, Vth,/3>1«- 
|{a,pathi/paths,/3>] 

w 

|(a,pathi I paths./3>]^"' 
Kxl, (path)*, ?i;r)]^"' 
|(?i;l, (path)*, ?ur)]^’' 

I(?«L, (path)*, 

|(2:l. (path)*, 2 :r)]^"‘ 


^ I dom(p) = ({/3} n V) and p[(iiL. P, /3)] G C'^('Ul) } > cardl^”^ ^ 
<^0,cardl<®) ^ 

^ { p I dom(p) = ({?nL, /?} n V) and 

P[(?'«L.P,/3)] e U C^(ix) } , cardl^'^^ ^ 

■uGX 

^ { p I dom(p) — ({/3} n V) and 

3 p[(uL, p. / 3 )] G C”^(«l) : P ^ {“l, ■■■,«„}}, cardl^^^ ^ 

<^0,cardl<®’ ^ 

^ { p I dom(p) = ({?nL. /?} n V) and 

3 p[(?«L.p, / 3 )] G C'^(u) : p ^ {ui, . . . , u„} } , cardl^^' ^ 

■uGX 

1(^1 path, a)]^’" 

'^{a,p}nv(l(«.pathi, Tn)]^* M [(?t;, paths, Z?)]^*) 
|(a,pathi,/3)]^’‘U |(a, paths,/3)1^’‘ 

^ I dom{fi) — {?iJR,} and /i(?i;R,) G ALPWl(a:L, path, W)} , cardl^^^ ^ 

^ { p I dom(p) — {?i^L, ?^r} and p(?i;l) G terms(Vtt) and 
p(? 1 !r) €: ALWPl(p(?nL), path, M^) } , cardl^^^ ^ 

I(mR.(Vth)*.?«L)l5,’‘ 

/ i{Pl»} if 3 p G [(a^L, (path)*, ?i;)]^’‘ : p(?i;) = aDR, \ 

\ I 0 else ’ / 


Figure 3. Context-based query semantics for SPARQL property paths over the Web. 


all IRIs that can be used to retrieve some document, which, due to the infiniteness of 
possible HTTP IRIs, cannot be achieved for the WWW). 

It is not difficult to see that the issue illustrated in the example exists for any triple pat¬ 
tern that has a variable in the subject position. On the other hand, triple patterns whose 
subject is an IRI do not have this issue. However, having an IRI in the subject position 
is not a sufficient condition in general. For instance, the PP pattern (Tim, h;nows, ?u) 
has the same issue as the pattern in Example 2 (in fact, both patterns are semantically 
equivalent under context-based semantics). A question that arises is whether there ex¬ 
ists a property of PP patterns that can be used to distinguish between patterns that do 
not have this issue (i.e., evaluating them over any WoLD is feasible) and those that do. 
We shall discuss this question for the more general case of PP-based SPARQL queries. 


5 SPARQL with Property Paths on the Weh 

After considering PP patterns in separation, we now turn to a more expressive fragment 
of SPARQL that embeds PP patterns as the basic building block and uses additional 
operators on top. We define the resulting PP-based SPARQL queries, discuss the fea- 


Function ALPWl ( 7 , path, W) 
Input: 7 € {lyj B\J C), 

path is a PP expression, 
W is a. WoLD. 

1: Visited \=% 

2: ALPW2 ( 7 , path, l/isited, PP) 
3: return Visited 


Function ALPW2 ( 7 , path. Visited, 

Input: 7 G (I U B U £), path is a PP expression, 

Visited C (I u B U £), VP is a WoLD. 

4: if 7 ^ Visited then 
5: add 7 to Visited 

6 : for all /r € |(?a;, path, ?t/)]|^’‘ s.t. /r(?a;) = 7 do 

7: ALPW 2 (^(? 3 /),path, Pisited, VP) l/7x,7y€V 


Figure 4. Auxiliary functions used for defining context-based query semantics. 


sibility of evaluating these queries over the Web, and introduce a syntactic property to 
identify queries for which an evaluation under context-based semantics is feasible. 


5.1 Definition 

By using the algebraic syntax of SPARQL [22], we define a graph pattern recursively 
as follows; (i) Any PP pattern (a, path, jS) is a graph pattern; and (ii) if Pi and P2 are 
graph patterns, then (Pi and P2), {Pi union P2), and (Pi OPT P2) are graph patterns.^ 
For any graph pattern P, we write V{P) to denote the set of all variables in P. 

By using PP patterns as the basic building block of graph patterns, we can readily 
carry over our context-based semantics to graph patterns: For any graph pattern P and 
any WoLD W, the evaluation of P over W under context-based semantics is a multiset 
of solution mappings, denoted by that is defined recursively as follows:® 

- If P is a PP pattern, then is defined in Definition 3. 

- If Pis (Pi ANDP2), then [P]^=^ = [Pil^- N [Psl^A 

- If P is (Pi UNION P2), then iPf^- = [PJ^- U |P2]^A 

- If P is (Pi OPTP2), then |P 1 ^- = ([Pi]^- N [P2]^=‘) U ([Pi]^- \ [P2ir). 


5.2 Discussion 

Given a query semantics for evaluating PP-based graph patterns over a WoLD, we now 
discuss the feasibility of such evaluation. To this end, we introduce the notion of Web- 
safeness of graph patterns. Informally, graph patterns are Web-safe if evaluating them 
completely under context-based semantics is possible. Formally: 

Definition 4. A graph pattern P is Web-safe if there exists an algorithm that, for any 
finite WoLD W = {D, data, adoc), computes |P]|iy^ by looking up only a finite number 
ofIRIs without assuming direct access to the sets D and dom(ac?oc). 

^ For this paper we leave out other types of SPARQL graph patterns such as filters. Adding them 
is an exercise that would not have any significant implication on the following discussion. 

* Note that the definition uses the algebra operators introduced in Section 3.1. 







Example 3 . Consider graph pattern = ((Bob, knows, Iv) and (?ii, knows, Tim)). 
The right sub-pattern P^2 = knows, Tim) is not Web-safe because evaluating it 
completely over the WTVW is not feasible under context-based semantics (cf Exam¬ 
ple 2). However, the larger pattern is Web-safe; it can be evaluated completely un¬ 
der context-based semantics. For instance, a possible algorithm may first evaluate the 
left sub-pattern, which is feasible because it requires the lookup of a single IRI only (the 
IRI Bobj. Thereafter, the evaluation of the right sub-pattern P^2 reduced to look¬ 

ing up a finite number of IRIs only, namely the IRIs bound to variable Iv in solution 
mappings obtained for the left sub-pattern. Although any other IRI u* might also be 
used to discover matching triples for P^2’ of these triples has IRI u* as its sub¬ 

ject (which is a consequence of restricting retrieved data based on the context selector 
introduced in Section 4.2). Therefore, the solution mappings resulting from such match¬ 
ing triples cannot be compatible with any solution for the left sub-pattern and, thus, do 
not satisfy the join condition established by the semantics o/and in pattern P^^. 

The example illustrates that some graph patterns are Web-safe even if some of their sub¬ 
patterns are not. Consequently, we are interested in a decidable property that enables to 
identify Web-safe patterns, including those whose sub-patterns are not Web-safe. 

Buil-Aranda et al. study a similar problem in the context of SPARQL federation 
where graph patterns of the form Pg = (service ?uP) are allowed [6]. Here, vari¬ 
able Iv ranges over a possibly large set of IRIs, each of which represents the address 
of a (remote) SPARQL service that needs to be called to assemble the complete re¬ 
sult of Pg. However, many service calls may be avoided if Pg is embedded in a larger 
graph pattern that allows for an evaluation during which Iv can be bound before evalu¬ 
ating Pg. To tackle this problem, Buil-Aranda et al. introduce a notion of strong bound¬ 
edness of variables in graph patterns and use it to show a notion of safeness for the 
evaluation of patterns like Pg within larger graph patterns. The set of strongly bound 
variables in a graph pattern P, denoted by SBV(P), is defined recursively as follows: 

- If P is a PP pattern, then SBV(P) = V(P) (recall that V(P) are all variables in P). 

- If P is of the form (Pi ANDP 2 ), then SBV(P) = SBV(Pi) U SBV(P 2 ). 

- If P is of the form (Pi union Pf), then SBV(P) = SBV(Pi) n SBV(P 2 ). 

- If P is of the form (Pi opt P 2 ), then SBV(P) = SBV(Pi). 

The idea behind the notion of strongly bound variables has already been used in ear¬ 
lier work (e.g., “certain variables” [23], “output variables” [24]), and it is tempting 
to adopt it for our problem. However, we note that one cannot identify Web-safe graph 
patterns by using strong boundedness in a manner similar to its use in Buil-Aranda et 
al.’s work alone. For instance, consider graph pattern P^^ from Example 3. We know 
that (i) Pg 3 is Web-safe and that (ii) V(Pg 3 ) = {?u} and also SBV(Pg 3 ) = {?u}. Then, 
one might hypothesize that for every graph pattern P, if SBV(P) = V(P), then P is 
Web-safe. However, the PP pattern P^2 = j knows, Tim) disproves such a hypothe¬ 
sis because, even if SBV(Pg 2 ) = V(Pg2)^ pattern P^2 not Web-safe (cf. Example 2). 

We conjecture the following reason why strong boundedness cannot be used directly 
for our problem. Eor complex patterns (i.e., patterns that are not PP patterns), the sets 
of strongly bound variables of all sub-patterns are defined independent from each other, 
whereas the algorithm outlined in Example 3 leverages a specific relationship between 


sub-patterns. More precisely, the algorithm leverages the fact that the same variable that 
is the subject of the right sub-pattern is also the object of the left sub-pattern. 

Based on this observation, we introduce the notion of conditionally Web-bounded 
variables, the definition of which, for complex graph patterns, is based on specific rela¬ 
tionships between sub-patterns. This notion shall turn out to be suitable for our case. 


Definition 5. The conditionally Web-bounded variables of a graph pattern P w.r.t. a set 
of variables X is the subset CBV{P \ X) C V{P) that is defined recursively as follows: 


If P is: 

then CBV{P \ X) is: 

1) {oi, u, (3) or (Q;,!(iti|...| Un), /H) such that a ^ {X \J C) or oi ^ X 

XP) 

2) (ex, u, 0) or (a,!(Tiij...j Un), 0) such that a ^ (X U C) and a ^ X 

0 

3) (a, {path)*,, 0) s.t. a G V and 0 ^ V 

CBv({p, {''path)*, a) 1 X) 

4) {ex, (path)*, 0) s.t. (i) a ^ V or 0 G V. and (ii) for any two variables ?x,?y G V 

CBV{{ex, path, /3) | X) 

it holds that CBV{{?x, path, ?y) \ {?a;}) — {?3:,?y} 


5) (a, (path)*, 0) such that none of the above 

0 

6) {a, ^path, 0) with P' — {0, path, a) 

7) {a, (path^ |path 2 ), 0) with P' — {{a, path^, 0) UNION {a, path 2 ,0)) 

CBV{P' 1 X) 

CBV{P' 1 X) 

8) {ex, pathi/path 2 , 0) s.t., for any ?v G V \ {X U {ex, /3}), ?v G CBV{P' \ X) 

CBV{P' 1 X) \ {?!;} 

where P' — ((a, path-y, 7v) AND {?v, path 2 ,0)) 


9) {ex, pathi/ path 2 , 0) such that none of the above 

0 

10) (Pi AND Pa) s.t. CBV{Pi 1 X) = v(Pi) and CBV(P 2 \ X) = ^(Pa) 

V{P) 

11) (Pi AND Pa) s.t. CBV{Pi 1 X) = v(Pi) and CBV(P 2 \ XU5BV(Pi)) = ^(Pa) 

V{P) 

12) (Pi AND Pa) s.f. CSV(Pa j X) = V(Pa) and CBv{Pi \ XUSBV(Pa)) = V(Pi) 

V{P) 

13) (Pi AND P 2 ) such that none of the above 

0 

14) (Pi UNION Pa) 

CBV{Pi 1 X)nCBV'(Pa 1 X) 

15) (Pi OPT Pa) s.t. CBV{Pi 1 X) = v{Pi)and CBV(Pa | X) = v(Pa) 

V{P) 

16) {Pi OPT Pa) s.t. CBV{Pi 1 X) = v{Pi) and CBV(P 2 \ XUSSV(Pi)) = V(Pa) 

V{P) 

17) (Pi OPT P 2 ) such that none of the above 

0 


Example 4. For the PP pattern P^2 = C^t;, knows, Tim)— which is not Web-safe (as 
discussed in Example 2) — if we use the set {?u} as condition, then, by line 1 in Defini¬ 
tion 5, it holds that CBV(^P^2 \ However, if we use the empty set instead, 

we obtain CBV(P^2 | 0) = 0 (cf line 2 in Definition 5). 

While for the non-Web-safe pattern P^2 tve thus observe CBV(P^2 I 0) 7^ 
for graph pattern P^^ = ((Bob, knows, ?u) AND {7v, knows, Tim)) — which is Web-safe (cf 
Example 3) — we have CBV(P^^ \ 0) = V{P^^)- The fact that CBV(P^^ \ 0) = {?v} fol¬ 
lows from (i) CBt/((Bob, knows, ?u) I 0) = (ii) SBt/((Bob, knows, ?r;)) = {?u}, 

(Hi) CBV(^{lv, knows, Tim) | {?u}) = {?u}, and (iv) line 11 in Definition 5. 

The example seems to suggest that, if all variables of a graph pattern are conditionally 
Web-bounded w.r.t. the empty set of variables, then the graph pattern is Web-safe. The 
following result verifies this hypothesis. 

Theorem 1. A graph pattern P is Web-safe if CBV(P \ 0) = V(P). 

Note 1. Due to the recursive nature of Definition 5, the condition CBV(P \ 0) = V(P) (as 
used in Theorem 1) is decidable for any graph pattern P. 

We prove Theorem 1 based on an algorithm that evaluates graph patterns recursively 
by passing (intermediate) solution mappings to recursive calls. To capture the desired 
results of each recursive call formally, we introduce a special evaluation function for a 
graph pattern P over a WoLD W that takes a solution mapping p, as input and returns 
only the solutions for P over W that are compatible with p. 





Definition 6. Let P be a graph pattern, let W bea WoLD, and let {L2, card) = 

Given a solution mapping p,, the /i-restricted evaluation of P over W under context- 
based semantics, denoted by |P | /i is the multiset of solution mappings {fi', card') 
with G' = {p' G n \ p' ^ p} and card'{p') = card{p') for all p' G L2'. 

The following lemma shows the existence of the aforementioned recursive algorithm. 

Lemma 1. Let P be a graph pattern and let p\„ be a solution mapping. If it holds that 
CBVi^P I dom(/rin)) = V[P), there exists an algorithm that, for any finite WoLD W, 
computes |P | pin by looking up a finite number ofIRIs only. 

Before providing the proof of the lemma (and of Theorem 1), we point out two im¬ 
portant properties of Definition 6 . First, it is easily seen that, for any graph pattern P 
and WoLD VF, fPlp^Tw^ = fP^f where p^ is the empty solution mapping (i.e., 
dom(p 0 ) = 0). Consequently, given an algorithm, say A, that has the properties of the 
algorithm described by Lemma 1, a trivial algorithm that can be used to prove Theo¬ 
rem 1 may simply call algorithm A with the empty solution mapping and return the 
result of this call (we shall elaborate more on this approach in the proof of Theorem 1 
below). Second, for any PP pattern {a, path, /?) and WoLD W, if a is a variable and 
path is a base PP expression (i.e., one of the first two cases in the grammar in Sec¬ 
tion 3.1), then |P | A* is empty for every solution mapping p that binds (variable) a 
to a literal or a blank node. Formally, we show the latter as follows. 

Lemma 2. Let P be a PP pattern of the form {Iv, u, j3) or (?u, !(ui | • • • | Un), /?) with 
Iv G V and u,ui,... ,Un G I, and let p be a solution mapping. Iflv G dom(/r) and 
pifiv) G C), then, for any WoLD W, fP \ is the empty multiset. 

Proof (Lemma 2). Recall that, for any IRI u and any WoLD W, context G^{u) contains 
only triples that have IRI u as their subject. As a consequence, for any WoLD W, every 
solution mapping p' G binds variable ?v to some IRI (and never to a literal or 

blank node); i.e., p'{lv) G X. Therefore, if Iv G dom(/j,) and p{lv) G {BU C), then p 
cannot be compatible with any p' G and, thus, |P | p}'^^ is empty. □ 

We use Lemma 2 to prove Lemma 1 as follows. 

Proof idea (Lemma 1). We prove the lemma by induction on the possible structure of 
graph pattern P. For the proof, we provide Algorithm 1 and show that this (recursive) 
algorithm has the desired properties for any possible graph pattern (i.e., any case of 
the induction, including the base case). Due to space limitations, in this paper we only 
present a fragment of the algorithm and highlight essential properties thereof. The given 
fragment covers the base case (lines 1 - 11 ) and one pivotal case of the induction step, 
namely, graph patterns of the form (Pi and P 2 ) (lines 57-72). The complete version of 
the algorithm and the full proof can be found in the Appendix. 

For the base case. Algorithm 1 looks up at most one IRI (cf. lines 2-5). The crux of 
showing that the returned result is sound and complete is Lemma 2 and the fact that the 
only possible context in which a triple (s, p, o) with s Gl can be found is C'^(s). 

For PP patterns of the form (Pi and P 2 ) consider lines 57-72. By using Definition 5, 
we show CBV(Pi | dom(/iin)) = V{Pi) and CBV(Pj | dom(pin) U dom(/r)) = V(Pj) 


Algorithm 1 EvalCtxBased{P, ^\„), which computes |P | 

1: if P is of the form {a, u, /3) or P is of the form {a, !(ui | • • • | Un), /3) then 
2: if a € X then u' := a 

3: else if a G V and a G dom(pin) and /rin(a) G X then u' := pin(a) 

4: else m':= null 

5: if u' is an IRI and looking it up results in retrieving a document, say d then 

6 : G := the set of triples in d (use a fresh set of blank node identifiers when parsing d) 

7: G'-={{s,p,o) gG\s = u'} 

8 : (t?, card) := [P]g' ([P]g' can be computed by using any algorithm that 

implements the standard SPARQL evaluation function) 

9: return a new multiset (i?', card') with Q' — {p' G f? | ~ pm} and 

card'{fi') = card{p!) for all p! G Q’ 

10: else 

11; return a new empty multiset (12, card) with 12 = 0 and dom(cord) = 0 
57; else if P is of the form (Pi AND P 2 ) then 

58; if CBv(Pi I dom(/rin)) = v(Pi) then i := 1; j := 2 else i := 2; j := 1 

59: Create a new empty multiset M = (12, card) with 12 = 0 and dom{card) — 0 

60: (12^’, card^') := EvalCtxBased{Pi, fi]„) 

61: for all fi G 12^’ do 

62: (P'^, card''') := EvalCtxBased{Pj , pL\„ U fj,) 

63: for all p' G 12'^ do 

64: fi* := fiU p! 

65: k := card^'{ii) ■ card''{p!) 

66: if p* G 12 then 

67: old := card{fj,*) 

68 : Adjust card such that card(p*) = fc + old 

69: else 

70: Adjust card such that card{p*) = k 

71: Add to 12 

72: return M 


for all /i G Therefore, by induction, all recursive calls (lines 60 and 62) look up 
a finite number of IRJs and return correct results; i.e., (12^\ card^') = \Pi \ pin 
and (12^, card^) = fPj \ n\n U for all /i G 12^i Then, since each fi C is 

compatible with all /r' G 12^ and all processed solution mappings are compatible with 
Bm, it is easily verified that the computed result is |(Pi and P 2 ) \ B'm ^ 

We are now ready to prove Theorem 1, for which we use Lemma 1, or more precisely 
the algorithm that we introduce in the proof of the lemma. 

Proof (Theorem 1). Let P be a graph pattern s.t. CBV(P | 0) = V(P). Then, given the 
empty solution mapping B9 with dom(/j, 0 ) = 0, we have CBV(P | dom(p, 0 )) = V(P). 
Therefore, by our proof of Lemma 1 we know that, for any finite WoLD W, Algo¬ 
rithm 1 computes |P | b^ f’y looking up a finite number of IRIs. We also know that 
the empty solution mapping is compatible with any solution mapping. Consequently, by 
Definition 6, JP | b% for any WoLD W. Hence, by passing the empty solu- 








tion mapping to it, Algorithm 1 can be used to compute for any finite WoLD W, 

and during this computation the algorithm looks up a hnite number of IRIs only. □ 

While the condition in Theorem 1 is sufficient to identify Web-safe graph patterns, the 
question that remains is whether it is a necessary condition (in which case it could be 
used to decide Web-safeness of a// graph patterns). Unfortunately, the answer is no. 

Examples. Consider the graph pattern P = (Pi union P2) with Pi = {ui^pi^lx) 
and P 2 = {u 2 tP 2 : ?2/)- We note that CBV{Pi \ 0) = {?x} and CB\/(P 2 | 0) = {?2/}, and, 
thus, CBV{P I 0) =0. Hence, the pattern does not satisfy the condition in Theorem 1. 
Nonetheless, it is easy to see that there exists a (sound and complete) algorithm that, for 
any WoLD W, computes |P](^^ by looking up a finite number of IRIs only. For instance, 
such an algorithm, say A, may first use two other algorithms that compute |Pi]^^ and 
by looking up a finite number of IRIs, respectively. Such algorithms exist by 
Theorem 1, because CBV{Pi | 0) = ^Pi) and CBV{P 2 | 0) = V{P 2 ). Finally, algo¬ 
rithm A can generate the (sound and complete) query result |P]](^^ by computing the 
multiset union |Pi]vi^^ U |P 2 ]^^, which requires no additional IRI lookups. 

Remark 1. The example illustrates that “only if” cannot be shown in Theorem 1. It 
remains an open question whether there exists an alternative condition for Web-safeness 
that is both sufficient and necessary (and decidable). 


6 Concluding Remarks and Future Work 

This paper studies the problem of extending the scope of SPARQL property paths to 
query Linked Data that is distributed on the WWW. We have proposed a context-based 
query semantics and analyzed its peculiarities. Our perhaps most interesting hnding is 
that there exist queries whose evaluation over the WWW is not feasible. We studied this 
aspect and introduced a decidable syntactic property for identifying feasible queries. 

We believe that the presented work provides valuable input to a wider discussion 
about dehning a language for accessing Linked Data on the WWW. In this context, there 
are several directions for future research such as the following three. First, studying a 
more expressive navigational core for property paths over the Web; e.g., along the lines 
of other navigational languages such as nSPARQL [21] or NautiLOD [8]. Second, in¬ 
vestigating relationships between navigational queries and SPARQL federation. Third, 
while the aim of this paper was to introduce a formal foundation for answering SPARQL 
queries with PPs over Linked Data on the WWW, an investigation of how systems may 
implement efficiently the machinery developed in this paper is certainly interesting. 


References 

1 . Abiteboul, S., Vianu, V.: Queries and Computation on the Web. Theor. Comput. Sci. 239(2), 
231-255 (2000) 

2. Alkhateeb, R, Baget, J.R, Euzenat, J.: Extending SPARQL with Regular Expression Patterns 
(for querying RDF). J. Web Sem. 7(2), 57-73 (2009) 


3. Arenas, M., Conca, S., Perez, J.: Counting Beyond a Yottabyte, or how SPARQL 1.1 Prop¬ 
erty Paths will Prevent Adoption of the Standard. In: Proceedings of the 21st International 
Conference on World Wide Web (2012) 

4. Bemers-Lee, T.: Design issues: Linked Data. Online (Jul 2006) 

5. Bouquet, P, Ghidini, C., Serafini, L.: Querying The Web Of Data: A Formal Approach. In: 
Proceedings of the 4th Asian Semantic Web Conference (2009) 

6 . Buil-Aranda, C., Arenas, M., Corcho, O., Polleres, A.: Federating Queries in SPARQLI.I: 
Syntax, Semantics and Evaluation. Journal on Web Semantics 18(1), 1-17 (2013) 

7. Fielding, R., Gettys, J., Mogul, J.C., Frystyk, H., Masinter, L., Leach, P.J., Berners-Lee, T.: 
Hypertext Transfer Protocol - HTTP/LL RFC 2616 (Jun 1999) 

8 . Fionda, V., Gutierrez, C., Pirro, G.: Semantic Navigation on the Web of Data: Specification 
of Routes, Web Fragments and Actions. In: Proceedings of the 2Ist International Conference 
on the World Wide Web (2012) 

9. Fionda, V., Pirro, G., Consens, M.: Extended Property Paths: Writing More SPARQL Queries 
in a Succinct Way. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence 
(AAAI) (2015) 

10. Fionda, V., Pirro, G., Gutierrez, C.: NautiLOD: A Formal Language for the Web of Data 
Graph. ACM Trans. Web 9(1) (Jan 2015) 

11. Florescu, D., Levy, A., Mendelzon, A.: Database Techniques for the World-Wide Web: A 
Survey. SIGMOD Rec. 27, 59-74 (1998) 

12. Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. W3C Reccomendation (2013) 

13. Harth, A., Speiser, S.: On Completeness Classes for Query Evaluation on Linked Data. In: 
Proceedings of the 26th AAAI Conference (2012) 

14. Hartig, O.: SPARQL for a Web of Linked Data: Semantics and Computability. In: Proceed¬ 
ings of the 9th Extended Semantic Web Conference (2012) 

15. Hartig, O., Pino, G.: A Context-Based Semantics for SPARQL Property Paths over the Web. 
In: Proceedings of the 12th Extended Semantic Web Conference (2015) 

16. Klyne, G., Carroll, J.J.: Resource Description Framework (RDF): Concepts and Abstract 
Syntax (2006) 

17. Kochut, K.J., Janik, M.: SPARQLeR: Extended SPARQL for Semantic Association Discov¬ 
ery. In: The Semantic Web: Research and Applications, pp. 145-159. Springer (2007) 

18. Konopnicki, D., Shmueli, O.: Information Gathering in the World-Wide Web: The W3QL 
Query Language and the W3QS System. ACM Transactions on Database Systems 23(4), 
369^10 (Dec 1998) 

19. Loseman, K., Martens, W.: The Complexity of Evaluating Path Expressions in SPARQL. In: 
Proceedings of the 31st ACM Symposium on Principles of Database Systems (2012) 

20. Mendelzon, A.O., Mihaila, G.A., Milo, T.: Querying the World Wide Web. In: I (ed.) Int. J. 
on Digital Libraries, vol. 1, pp. 54-97 (1997) 

21. Perez, J., Arenas, M., Gutierrez, C.: nSPARQL: A Navigational Language for RDF. Journal 
on Web Semantics 8(4), 255-270 (2010) 

22. Perez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Transac¬ 
tions on Database Systems (TODS) 34(3) (2009) 

23. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL Query Optimization. In: Pro¬ 
ceedings of the 13th International Conference on Database Theory (2010) 

24. Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compilation. Syn¬ 
thesis Lectures on Data Management, Morgan & Claypool Publishers (2011) 

25. Umbrich, J., Hogan, A., Polleres, A., Decker, S.: Link Traversal Querying for a diverse Web 
of Data. Semantic Web Journal (2014) 

26. Wood, P.T.: Query Languages for Graph Databases. SIGMOD Rec. 41(1) (2012) 


A Proof of Lemma 1 


Suppose P is a graph pattern and is a solution mapping such that 

CBV(P I dom(pin)) = V(P). 

We have to show that there exists a (sound and complete) algorithm that, for any finite 
WoLD W, computes |P | /Xjn by looking up a finite number of IRJs only. For the 
proof we provide Algorithm 1 and show by induction on the possible structure of graph 
pattern P that this (recursive) algorithm has the desired properties. 

For the proof we use the following fact, which is easily verified by Definition 5. 

Fact 1. Let P be a graph pattern, and let X QV and X' QV be two (nonempty) sets 
of variables. Then, CBV{P \ X) C CBV(P \ X U X'). 

A.l Base Case 

Suppose P is either a PP pattern (a, u, /3) or a PP pattern {a, !(wi |... | Un),P) (with 
u,ui, ...,Un G X). The corresponding fragment of Algorithm 1 for this case is given as 
follows. 

1: if P is of the form (a, u, j3) or P is of the form {a, !(ui | • • • | Un),j3) then 
2 : if a £ X then u' := a 

3: else if a € V and a e dom(pin) and p,in{o.) G I then u' := p,\„{a) 

4 : else null 

5: if u' is an IRI and looking it up results in retrieving a document, say d then 

6 : G := the set of triples in d (use a fresh set of blank node identifiers when parsing d) 

7: G'-.= {{s,p,o)eG\s = n'} 

8 : (n, card) := [P]g' (|P]g' can be computed by using any algorithm that 

implements the standard SPARQL evaluation function) 

9: return a new multiset {O', card') with 17' = {p' € 12 | /r' ~ pin} and 

card'(p!) = card(p!) for all p! G 12' 

10: else 

11: return a new empty multiset ( 12 , card) with 12 = 0 and dom(card) = 0 

We distinguish three cases (which correspond to the three cases in lines 2-4): 

1. If a is an IRI (i.e., a G I), Algorithm 1 looks up this IRI, which either may result 
in retrieving a document or not. In the following, we consider both cases: 

(a) If the lookup results in retrieving a document d, Algorithm 1 executes lines 6 
to 9, and we know that d G D and adoc{a) = d hold for the queried WoLD 
W = {D, data, adoc). In this case the algorithm selects specific triples from 
document d to obtain an RDF graph G' (cf. line 7). Since this selection resem¬ 
bles the application of the context selector (cf. Section 4.2), it holds that 
G' = C'^(a). Then, it is easily seen that, by using a standard evaluation algo¬ 
rithm for the computation in line 8, multiset (17, card) is equivalent to query 
result IPJI^'^ (cf. Figure 3) and (12', card') is equivalent to |P | (cf. 

Definition 6). 




(b) If the lookup of IRI a does not result in retrieving a document. Algorithm 1 ex¬ 
ecutes line 11, and we know that a ^ dom(adoc) holds for the queried WoLD 
W = (£), data, adoc). As a consequence, C^(a) = 0 (cf. Section 4.2). Then, 
by Dehnition 3, is the empty multiset of solution mappings, and so is 

|P I fi\„ (cf. Dehnition 6). Hence, the empty multiset of solution mappings 
returned by Algorithm 1 (line 11) is the correct result in this case. 

2. If a is a variable and solution mapping fi\„ binds this variable to an IRI (i.e., a G V 
and /iin (a) G I), then Algorithm 1 looks up this IRI, which either may result in 
retrieving a document or not. In the following, we consider both cases: 

(a) If the lookup results in retrieving a document d, Algorithm 1 executes lines 6 

to 9, and we know that d G D and adoc{fi\„{a)) = d hold for the queried 
WoLD W = {D, data, adoc). Similar to case la before, we can show for the 
RDF graph G' constructed in line 7, that G' = C^(/iin(a)) holds. Since a 
is a variable, by Dehnition 3, we would have to search for triples that match 
triple pattern tp = p\^[{a,p, /?)] (with p = w, resp. p £ I \ {ui,..., u„}) in 
the context of all IRIs u* G I. However, since p\n{a) is an IRI, the 

only context that can contain such matching triples is G' = C^{ii\n{a)) (cf. 
Section 4.2). As a consequence, = IF’Jg' and, thus, the multiset of so¬ 

lution mappings (I7', card') returned in line 9 is equivalent to JP | /im (cf. 
Dehnition 6). 

(b) If the lookup of IRI a does not result in retrieving a document. Algorithm 1 

executes line 11, and we know that fj,\n{a) ^ dom(adoc) holds for the queried 
WoLD W = {D, data, adoc). As in case 2a, the only context that can con¬ 
tain matching triples for triple pattern pL\„[{a,p, j3)\ is C^{p]„{a)). However, 
C^(/im(a)) = 0 because fj,m{a) ^ dom(adoc). Thus, is the empty 

multiset of solution mappings (cf. Dehnition 3), and so is |P | fi\n (cf. 
Dehnition 6). Hence, the empty multiset of solution mappings returned by Al¬ 
gorithm 1 (line 11) is the correct result in this case. 

3. If none of the other two cases holds, then either (i) a is a variable and solution 
mapping pm binds this variable to a blank node or a to literal (i.e., a G V and 
p\n{a) G BUC) or (ii) a is a literal. Note that, due to CBV(P | dom(pin)) = V(P), 
by Dehnition 5, we can rule out a third possibility of a being a variable that is not 
bound at all by solution mapping p\n (i.e., a G V and a ^ dom(/j,in)). Algorithm 1 
executes line 11 and returns the empty multiset of solution mappings. In the follow¬ 
ing, we show that this is the correct result for each of the two (possible) sub-cases: 

(a) If a G V and pin{ct) G S U £, then, by Lemma 2, query result |P | pin is 
the empty multiset. 

(b) If a G £, then, by Dehnition 3, query result [P]^^ is the empty multiset of 
solution mappings, and so is |P | p\n 

Our discussion shows that, for each of the three cases. Algorithm 1 looks up a hnite 
number of IRIs (that is, one in the hrst and in the second case, respectively, and none in 
the third case) and returns the correct result. 


A.2 Induction Step 

We now discuss the induction step, for which we distinguish ten cases. 


Case 1: Suppose P is a PP pattern (a, /3). 

The fragment of Algorithm 1 that covers this case is given as follows. 

12: if P is of the form {a, ^ath, /3) then 
13: Create a PP pattern P' = {/?, path, a) 

14: return EvalCtxBasedi^P', 

Let P' = (/3, path, a) be the PP pattern created in line 13. To show that, for any finite 
WoLD W, Algorithm 1 computes |P | pin by looking up a finite number of IRIs 
only, it suffices to prove the following two claims: 

Claim P. IP I Pin 11^=^ = IP' I Pin for any WoLD W. 

Claim 2: CBV(P' | dom(pin)) = V(P'). 

Then, by induction it follows that Algorithm 1 has the desired properties for pattern P. 

To verify the first claim we recall that holds for any WoLD W (cf. 

Definition 3). By using this equivalence and Definition 6, we obtain Claim 1. 

To prove Claim 2 we use the fact that 

CBV(P I dom(pin)) = V(P). 

Since, CBv(P | dom(pin)) = CBv(P' | dom(pin)) (cf. Definition 5), we thus have 

CBV(P' I dom(pin)) =V(P), 

Then, by using V(P) = V(P'), we obtain 

CBV(P' I dom(pin)) = V(P')- 


Case 2: Suppose P is a PP pattern {a, pathi/path2, / 3 ). 

The fragment of Algorithm 1 that covers this case is given as follows. 

15: if P is of the form {a, path]^/path 2 , /3) then 
16: Create a graph pattern P'= ({a, pathj, ?ii) AND (?u, path 2 ,/3)) 

such that ?u G V \ (dom(pin) U {a, /3}) 

17: M \= EvalCtxBased(^P', 

18: M' := TT^c,,p}nv{M) (this multiset projection is defined in Section 3.1 and 

can be computed by using a standard algorithm) 

19: return M' 

Let P' = ((a, path^, ?u) and (?u, path2, /?)) be the graph pattern created in line 16; 
i.e., ?u G V \ (dom(/iin) U {a, P}) and, thus, Iv ^ dom(/iin). To show that, for any 
finite WoLD W, Algorithm 1 computes |P | by looking up a finite number of 

IRIs only, it suffices to prove the following two claims; 



Claim 1: |P | fi-,„ I Min for any WoLD W. 

Claim 2: CBV(P' | dom(^in)) = V(P'). 

Then, by induction it follows that Algorithm 1 has the desired properties for pattern P. 

To verify the first claim we recall that = 7r{a,^}nv(lP’lvv^) holds for any 

WoLD W (cf. Definition 3). By using this equivalence, the fact that 7v ^ dom(pin), 
and Definition 6 , we obtain Claim 1. 

To prove Claim 2 we recall that CBV(P | dom(pin)) = V(P). Therefore, by Defi¬ 
nition 5, it holds that Iv € CBV(P' | dom(pin)) and; 

CBV(P I dom(^in)) = CBV(P' | dom(pin)) \ {?f}- 

Due to the former, we can rewrite the latter to obtain: 

CBV(P I dom(pin)) U {?t;} = CBV(P' | dom(pin)). 

By using CBV(P | dom(/iin)) = V(P) again, we rewrite to: 

V(P) U {?t;} = CBV(P' I dom(pin)), 

and, with V(P) U {?t;} = V(P'), 


V(P') = CBV(P' I dom(pin)). 

Case 3: Suppose P is a PP pattern {a, (path^ |path 2 ), /3). 

This case is covered by the following fragment of Algorithm 1. 

20; if P is of the form {a, pathj^ |path 2 , /3) then 

21: Create graph pattern P'= ((a, path^,/?) UNION (a, path 2 ,/?)) 

22: M \= EvalCtxBased(^P', 

23: return M 

Due to the semantics of the operator union (as given in Section 5.1), for the graph pat¬ 
tern P' constructed in line 21 of Algorithm 1 and any WoLD W, it holds that 

= [{ct,pathi,U |(a, path2, 

Furthermore, by Definition 3, for any WoLD W, it holds that 

|{a,pathi |path2,/3)l^^ = |{a,pathi, U |(q;, path 2 , 

Hence, for any WoLD W, |Pl^^ = and, thus, 

I Min = [P’d Min (1) 

Moreover, by using (i) the fact that CBV(P | dom(^;n)) = V(P), (ii) V(P) = V(P'), 
and (iii) CBV(P | dom(pin)) = CBV(P' | dom(/iin)) (cf. Definition 5), we can show 

CBV(P' I dom(pin)) = V(P'). 


( 2 ) 



Due to (1) and (2), we may use the same argument as for case 6 below—which is 
the case that covers patterns of the form (Pi union P 2 )—to show that, for any finite 
WoLD W, Algorithm 1 computes query result |(a, (pathi|path 2 ),/3) by 

looking up a finite number of IRIs only. 

Case 4: Suppose P is a PPpattern (xl, (path)*, ?wr) s.t. xl G (lU£)and?z;R G V. 

We have to show that, for any finite WoLD W, Algorithm 1 computes query result 
[(xL, (path)*, ?Ur) I Pin by looking up a finite number of IRIs only. The corre¬ 
sponding fragment of Algorithm 1 that covers this case is given as follows. 

24: if P is of the form (xl, (path)*, ?i>r) such that xl G (X U £) and Tdr £ V then 
25: Create a new empty multiset M = (P, card) with Q = % and dom(card) = 0 

26: W := £xecAL/’VV'i(xL,path) 

27: for all x £ X do 

28: if ?ur ^ dom(pin) or pin(?nR) = x then 

29: Create a new solution mapping p such that dom(p) = {?nR} and p(?i;r) = x 

30: Add p to P 

31: Adjust card such that card{fj,) = 1 

32: return M 

Line 26 of the given fragment of Algorithm 1 calls a function ExecALPWl. This func¬ 
tion is given by Algorithm 2; it calls another function, named ExecALPWl (cf Algo¬ 
rithm 3). It is easily seen that function ExecALPWl implements the auxiliary function 
ALPWl as used in Definition 3 (cf. Figure 4). Before we discuss Algorithm 1, we prove 
the following two claims: 

Claim 1: Function ExecALPWl implements the other auxiliary function, ALPW2. 

Claim 1: During any execution of ExecALPWl, the execution of Algorithm 1 in 
line 5 looks up a finite number of IRIs only. 

To prove these claims we use the fact that CBV(P | dom(pin)) = V(P). Therefore, by 
Definition 5, we know that, for any two variables ?u G V and Iw G V, it holds that 
CBV((?u, path, Iw) I {?u}) = {?u, 7w}. Hence, CBV(P' | dom(p')) = V(P') where 
P' = {?x, path, ly) is the PP pattern created in line 3 of function ExecALPWl (cf. 
Algorithm 3) and p' is the solution mapping created in line 4. Therefore, by induction 
we can assume that the execution of Algorithm 1 in line 5 has two properties: (i) it 
returns \P' \ p' ] and (ii) it looks up a finite number of IRIs only. While the latter 
directly verifies Claim 2, we use the former to show Claim 1; in particular, we use 
{Q,card) = lP'\p'\^^, where (12, card) is the multiset initialized in line 5. Then, 
due to the properties of solution mapping p' (cf. line 4), for each solution mapping 
p G 17, it holds that p(?x) = 7 . Consequently, function ExecALPWl implements the 
auxiliary function ALPW2, where 17 in function ExecALPWl corresponds to the set of all 
solution mappings that are considered by the loop in ALPW2 (cf. lines 6-7 in Figure 4). 

After proving Claims 1 and 2, we now come back to Algorithm 1. For the multiset 
M that is populated by lines 27-31 in Algorithm 1, let M* denote the fully populated 
version of M (i.e., before executing the return statement in line 32). Since functions 
ExecALPWl and ExecALPWl implement ALPWl and ALPW2, respectively, it can be eas¬ 
ily seen that M* = |P | pin (i.e.. Algorithm 1 returns the expected result for PP 



Algorithm 2 ExecALPWl{'^, path), which computes ALPW 1 ( 7 , path, W) (as given 
in Figure 4) for the queried WoLD W. 

1: Visited'.= % 

2: Visited ;= ExecALPW2( 7, path, Visited ) 

3: return Visited 


Algorithm 3 ExecALPW2{'-^Visited), which computes auxiliary function 
ALPW 2 ( 7 , path, Visited, W) (as given in Figure 4) for the queried WoLD W. 

1: if 7 ^ Visited then 
2: Add 7 to Visited 

3: Create a PP pattern P' = (?x, path, ?y) with ?x, ?y € V 

4: Create a new solution mapping y such that dom(/i') = {?x} and y.'{?x) = 7 

5: (n, card) := EvalCtxBased{P', y ) (i.e., call Algorithm 1 to compute |P'| fi 

6 : for all p G i? do 

7: Visited := ExecALPW2(^fi{?y),path, Visited) 

8 : return Visited 


pattern P). It remains to show that, during the computation of this result over a finite 
WoLD, Algorithm 1 looks up a finite number of IRIs only: Due to the use of set Visited 
in function ExecALPW2, none of the IRIs that recursive calls of this function discover 
is considered more than once. As a consequence of this observation and of Claim 2, it 
follows that, if the queried WoLD W — {D, data, adoc) is finite, then dom(adoc) is 
finite and, thus, any execution of function ExecALPW2 (including all recursive calls in 
line 7) looks up a finite number of IRIs only, and so does the execution of ExecALPWl 
in line 26 of Algorithm 1. Since none of the other lines of the corresponding fragment 
of Algorithm 1 (i.e., lines 24-32) involves IRI lookups, the algorithm looks up a finite 
number of IRIs to compute |P | /Tin for any finite WoLD W. 

Case 5: Suppose P is a PP pattern (?ul, (path)*, ?ur) such that Ivp, ?ur C V. 

The fragment of Algorithm 1 that covers this case is given as follows: 

33: if P is of the form (?dl, (path)*, ?ur) such that ?wl £ V and ?wr G V then 
34: if ?wl £ dom(/iin) then 

35: Create a new empty multiset M = (17, card) with 17 = 0 and dom(card) = 0 

36: X ■.= ExecALPWl{y\„{lvV),path) 

37 : for all a: G A do 

38: if ?ur ^ dom(/iin) or jj,\„{?VB.) = x then 

39: Create a new solution mapping p such that (i) dom(p) = {?ul, ?ur}, 

(ii) p(?ul) = /iin(?UL), and (iii) p(?ur) = x 
40: Add p to 17 

41: Adjust card such that card{y.) — 1 

42: return M 

43: else 

44: Create PP pattern P' = (?ur, (^ath)*, ?t;L) 

45: M := EvalCtxBased(^P', ii]„) 

46: return M 












The algorithm distinguishes whether Ivi^ S dom(^in) or ?ul ^ dom(/rin). In the for¬ 
mer case. Algorithm 1 executes lines 35-42, which are similar to the fragment of Al¬ 
gorithm 1 that covers the previous Case 4 (cf. lines 25-32 before), and the proof that 
executing lines 35-42 has the desired properties for PP pattern (?t;Lj (path)*, Ivs) is 
also similar to the discussion of Case 4. Hence, we omit repeating this discussion and 
focus on the second sub-case, Yul ^ dom(/iin) (which is covered by lines 44-46). As a 
basis for discussing this case we need the following two lemmas. We prove these lem¬ 
mas after completing the proof of Lemma 1 (cf. page 26 for the proof of Lemma 3 and 
page 28 for the proof of Lemma 4). 

Lemmas. Let P = path,7v-ii) be a PP pattern such that 7 vi,, 7 S V, andlet 
X C V be a set of variables. If CBV{P \ X) = V{P), then ?ul & X or ?ur S X. 

Lemma 4. For any PP expression path and any pair of variables ?ul, C V, 
the two PP patterns P = (?ul, (path)*, Tur) and P' = (?ur, {'^path)*, 7vif) are 
semantically equivalent under context-based semantics; i.e., = \P'\^’^ holds 

for any WoLD W. 

Due to the fact that CBV(P | dom(/ijn)) = V(P), we can use Lemma 3 to show that, if 
?ul ^ dom(/rin), then ?z;r G dom(/iin). Therefore, the recursive call in line 45 (which 
swaps the subject and the object) will result in executing an instance of Algorithm 1 
that meets the first sub-case (i.e., the recursive call in line 45 performs lines 35-42). 

Moreover, the fact that CBV(P | dom(pjn)) = V(P) can also be used to show that 
CBV(P' I dom(pin)) = V(P') where P'= (7 vr, (^ath)*, 7 vl) is the PP pattern cre¬ 
ated in line 44. Then, by induction we can assume that, for any finite WoLD W, the 
recursive call in line 45 looks up a finite number of IRIs only and returns |P' | 

As a consequence, we can use Lemma 4 and Definition 6 to show that Algorithm 1 has 
the desired properties for graph pattern P with 7vl f. dom(/iin). 


Case 6: Suppose P is a PP pattern (?ul, (path)*, xr) s.t. Yul € V and xr € (PU£). 
The fragment of Algorithm 1 that covers this case is given as follows; 

47; if P is of the form (?ul, (path)*, xr.) such that Tul G V and xb . G (X U £) then 
48; Create PP pattern P' = (xr, (^ath)*, Tvf) 

49; M := EvalCtxBased(P', 

50; return M 

Let P' be the PP pattern created in line 48; i.e., P' = (xr, (^ath)*, ?wl). To show 
that, for any finite WoLD W, Algorithm 1 computes |P | p-m by looking up a finite 
number of IRIs only, it suffices to prove the following two claims; 

Claim 7; |P I Pi, 1^" = JP' | pi, 1^" for any WoLD W. 

Claim 2\ CBV(P' | dom(pin)) = V(P'). 

Then, by induction it follows that Algorithm 1 has the desired properties for pattern P. 

To verify the first claim we recall that, for any WoLD W, [P]^'' = (cf. Def¬ 

inition 3). Therefore, by Definition 6, Claim 1 follows trivially. 



It remains to prove Claim 2. By Definition 5, we have: 



Case 7; Suppose P is a PP pattern (xl, (path)*, xr) such that xl, xr € (I U £). 
The fragment of Algorithm 1 that covers this case is given as follows: 

51: if P is of the form (xl, (path)*, xr) such that xl G (P U £) and xr G (P U £) then 
52: X := ExecALPWl{xi^,^at\i) 

53: for all x G X do 

54: if X = XR then 

55: retnrn anew multiset (17, card) with 17 = {/rg} and card = cardl^^^ 

56: return a new empty multiset M = (17, card) with 17 = 0 and dom(card) = 0 

This fragment of the algorithm leverages the fact that the definition of query result 
|(xL, (path)*,XR)](:^^ (cf. Figure 3) can be rewritten as follows: 



{pa} if Xr G ALWPl(xL,path, PF), 
0 else 


, cardl(«) ) 


|(xL,(path)*,XR)]^=‘ 


Then, the discussion of this case resembles the discussion of Case 4 above. 

Case 8 : Suppose P is (Pi and P 2 ). 

As a basis for discussing this case, we first show that 

CBV(Pi I dom(/rip)) = V(Pi) or CBV(P2 | dom(pin)) = V(P2). (3) 

Thereafter, we use this fact to show that Algorithm 1 has the desired properties for 
P = (Pi ANDP2). 

To show (3), we use proof by contradiction. That is, we assume 

CBV(Pi I dom(pin)) ^ V(Pi) and CBV(P2 | dom(/rin)) ^ V(P2). 

Then, by Definition 5 , CBV(P | dom(/iin)) = 0 . Since CBV(P | dom(/iin)) = V(P), 
we have V(P) = 0 and, thus. 


V(Pi) = 0 and V(P 2 ) = 0. 


(4) 



Since CBV(P' | dom(/iip)) C V(P') holds for any graph pattern P' (cf. Definition 5), 
we have CBV(Pi | dom(/rjn)) C V(Pi) and CBV(P 2 | dom(/rip)) C V(P 2 ). With (4), 
we obtain 


CBV(Pi I dom(/rip)) = 0 and CBV(P 2 | dom(^ip)) = 0. 

Hence, again with (4), 

CBV(Pi I dom(^ip)) = V(Pi) and CBV(P 2 | doin(/iip)) = V(P 2 ), 

which contradicts our assumption and, thus, shows that (3) holds. 

We now show that, for any finite WoLD W, Algorithm 1 computes query result 
[(PlANDP2)|/ripl^>^ by looking up a finite number of IRIs only. The fragment of 
Algorithm 1 that covers this case is given as follows. 

57; if P is of the form (Pi AND P 2 ) then 

58; if CBv(Pi I dom(/rin)) = v(Pi) then i := 1; j := 2 else i := 2; j := 1 

59; Create a new empty multiset M = (17, card) with 17 = 0 and dom(card) = 0 

60; (17^% card^') := EvalCtxBased{Pi, fi\„) 

61; for all fj. G 17^’ do 

62; (17^, card^) ;= EvalCtxBased{Pj, U fj,) 

63; for all /r' G 17'^ do 

64; /i* ;= /i U 

65; k ;= card^'{ii) ■ card^{^') 

66 ; if G 17 then 

67; old := card{fj,*) 

68 ; Adjust card such that card(fi*) = fc + old 

69; else 

70; Adjust card such that card(^*) = k 

71; Add to 12 

72; return M 

The algorithm first determines whether CBV(Pi | dom(/rip)) = V(Pi) (which is decid¬ 
able by using Definition 5 recursively). If CBV(Pi | dom(/rin)) = V(Pi), the algorithm 
lets i = \ and j = 2; if CBV(Pi | 0) ^ V(Pi), i = 2 and j = 1. Due to (3), it holds 
that CBV(Pi I dom(^ip)) = V(Pi). Therefore, by induction we can assume that, when 
Algorithm 1 calls itself in line 60, the recursive execution looks up a finite number of 
IRIs only and for the result card^') it holds that (17^S card^') = |Pi | n\n 
Next, the algorithm iterates over all solution mappings jj, G . We claim that 

G 17^‘ : CBV(Pj I dom(/rip) U dom(^)) = y{Pj). (5) 

Note, if (5) holds, by induction we can assume that, for each solution mapping /i G 
the recursive call in line 62 looks up a finite number of IRIs only and for the result 
card^) it holds that (17^, card^) = \Pj \ U 
Hence, before we continue the discussion of the algorithm, we prove the claim: 
Let fjL be an arbitrary solution mapping with ^ G W.l.o.g., it suffices to show 
that CBV(Pj I dom(/iin) U dom(/r)) = V(Pj) holds, for which we use the fact that 
CBV(P I dom(/rip)) = V(P) holds. In particular, since CBV(Pi | dom(/rip)) = V{Pi) 



holds as well, we note that CBV(P | dom(/iip)) = V(P) holds only because at least one 
of the following conditions is satisfied (cf. Definition 5): CBV(Pj | dom(/iip)) = V(Pj), 
CBV(Pi I dom(/iip)USBV(Pi)) = orV(P) = 0. We now show that each of these 

conditions entails CBV(Pj | dom(/iip) U dom(/j,)) = V(Pj). 

1. If CBV(Pj |dom(/iip)) = V(Pj), then CBV(Pj | dom(//ip) U dom(^)) = V(Pj) 
follows by using Fact 1. 

2. If CBV(Pi I dom(^ip) U SBV(Pi)) = V(Pi), then, due to fj, € fPi \ and, 

thus, SBV(Pi) C dom(^), we obtain CBV(Pj | dom(/j,in) U dom(^)) = V{Pj) by 
Fact 1. 

3. If V(P) = 0, then CBV(Pj | dom(/Xip) Udom(/x)) = V{Pj) is a trivial consequence 
of V(Pj) C V(P) and CBV(Pj | dom(/rip)) C V{Pj). 

Hence, we verified the correctness of (5) and now come back to Algorithm 1. As men¬ 
tioned before, after computing [P^ | pip = {f2^', card^') (in line 60), for each 
p S the recursive call in line 62 computes \Pj \ p,ip U , card^) by 

looking up a finite number of IRIs only. Then, the algorithm populates a new, initially 
empty multiset M incrementally as follows. 

For each pair of a solution mapping p G and a corresponding solution map¬ 
ping /r' € the algorithm generates a joined solution mapping fj,* = /lU fj,' (which 
is possible because, due to p' G fji and p' are compatible) and adds p* exactly 
k times to multiset M, where k = card^'{^i) ■ card^{^'). Let M* denote the re¬ 
sulting, fully populated version of multiset M (i.e., after populating it incrementally 
based on all /r' G 17^ for all /i G It is easily seen that M* is the expected re¬ 

sult of the /Tip-restricted evaluation of graph pattern (Pi ANDP 2 ) over WoLD W (i.e., 
M* = |(Pi AND P 2 ) I /Tip I jy^). Hence, the algorithm returns M*. Since each of the re¬ 
cursive calls looks up a finite number of IRIs and the intermediate result fPi \ pip ] 
is finite (because of the finiteness of the queried WoLD W), the number of IRIs looked 
up during the computation of |(Pi and P 2 ) | pin is finite. 


Case 9: Suppose P is (Pi union P2). 

We have to show that, for any finite WoLD W, Algorithm 1 computes query result 
[(Pi UN 10 NP 2 ) Ipipij;^^ by looking up a finite number of IRIs only. The corresponding 
fragment of Algorithm 1 for this case is given as follows. 

73; if P is of the form (Pi UNION P 2 ) then 
74: Ml := EvalCtxBasediPi, 

75: M 2 '■= EvalCtxBasedl^P2, Urn) 

76: M := Mi U M 2 (this multiset union is defined in Section 3.1 and 

can he computed hy using a standard algorithm) 

77: return M 

As a basis for discussing this case we emphasize that 

CBV(Pi I dom(pin)) = V(Pi) and CBV(P2 | dom(pin)) = V(P2), (6) 

which follows from (i) Definition 5 , (ii) V(P) = V(Pi) U V(P2), and (iii) the fact 
that CBV(P I dom(pin)) = V(P). Therefore, by induction we can assume that each 



of the two recursive calls in line 74 and 75 looks up a finite number of IRIs in the 
queried WoLD W, and for the results Mi and M 2 it holds that Mi = |Pi | and 

M 2 = 1^2 I /iin Then, it is easily seen that M = Mi LI M 2 is the expected result of 
the Pin-restricted evaluation of graph pattern (Pi union P 2 ) over WoLD W (i.e., M = 
|(Pi UNION P 2 ) I Pin liy^) and the number of IRIs looked up during the computation of 
this result is finite. 


Case 10: Suppose P is (Pi 0 PTP 2 ). 

The corresponding fragment of Algorithm 1 for this case is given as follows. 

78; if P is of the form (Pi OPT P 2 ) then 

79: Create a new empty multiset Mout = (Pout, cardout) with Pout = 0 and dom(cardoLt) = 0 

80; (P^^, card^^) := EvalCtxBased{Pi, pin) 

81: for all p e P^^ do 

82: , card'^) := EvalCtxBased{P 2 , fJ.) 

83: if P'" = 0 then 

84: if p € Pout then 

85: old := cardout(p) 

86: Adjust cardout such that cardout(p) = old + 1 

87: else 

88: Adjust cardout such that cardout(p) = 1 

89: Add p to Pout 

90: else 

91: for all p'e P" do 

92: if p' and pin are compatible then 

93: p* := p U p' 

94: k := card^^{^) ■ card^{^') 

95: if p* € Pout then 

96: old := cardout (p*) 

97: Adjust cardout such that cardout (p*) = fc -f old 

98: else 

99: Adjust cardout such that cardout (p*) = k 

100: Add p* to Pout 

101: return Mout 

We omit the discussion of this case because it is very similar to the discussion of case 5 
for patterns of the form (Pi and P 2 ). □ 


B Proof of Lemma 3 


Suppose it holds that 

CBV((?ul, path, ?wr) I X) = V((?UL, path, ?ur)) . (7) 

We have to show that ?ul € or ?ur C X holds. For this proof we use an induction 
on the possible structure of PP expression path. 



B.l Base Case 


Suppose path is either an IRI tt G I or of the form !(mi |... | u„) with ui, ..., G X. 
By using (7) and the fact that v((?z;lj path, ?'Ur)) ^ 0, we have ?wl G X (cf. Defini¬ 
tion 5). 


B.2 Induction Step 

For the induction step we distinguish four cases (which correspond to the last four cases 
in the grammar of PP expressions as given in Section 3.1). 


Case 1: Suppose path is of the form ^ath^. where path^. is an arbitrary PP expres¬ 
sion. We claim that 

CBV((?UR,path 3 .,?WL) | 7 f) = v((?ur, path^,, ?wl)) . ( 8 ) 

If (8) holds, then ?ul G X or ?ur G X holds by induction. Hence, it remains to show (8). 
By Definition 5, we have: 

CBV((?UL,^ath,j,,?UR) |X) = CBV((?UR,path2,,?UL) |7f). 

By using CBv({?ul, ^ath^,, ?ur) | X) = v((?wl, ^ath^,, ?ur)) (cf. (7) above), we 
obtain: 


v((?UL,^ath3,,??;R)) = CBV((?z;r, path,^, ?ul) | 7 f). 

Then, with v((?ul, ^ath,j,, Ywr)) = v((?UR,path3,, Tur)), we can verify the cor¬ 
rectness of (8). 


Case 2: Suppose path is of the form (path^,)* where path^, is an arbitrary PP ex¬ 
pression. By using an argument similar to the argument used for the previous case, we 
can show that CBV((?ur, path^,, ?wl) | X) = v((?'(;r, path^,, ?ul))- Then, ?ul € X 
or ?ur G X holds by induction. 


Case 3: Suppose path is of the form (path2|path2) where pathj^ and path2 are 
arbitrary PP expressions. We claim that: 

V* G {1,2} : CBV((?UL,pathj,?WR) |X) = v((??;l, pathj, ?wr)) . (9) 

If (9) holds, then ?ul G X or ?ur G X holds by induction. Hence, it remains to show (9). 
By Definition 5, we have: 

CBV((?i;l, (pathi|path2),?WR) |X) = Q CBV((?i;l, path^, ?ur) | X). 

ie{1.2} 


By using CBV((?-ul, (path Jpath2), ?-ur) | X) = v((?-ul, (path Jpath2), ?-ur)) 
(cf. ( 7 ) above), we obtain: 

v((?fL, (pathi|path2),?UR)) = Q CBV((?ul, path^, ?wr) | X). 

*£{ 1 . 2 } 

Then, with v((?tiL, (pathi|path2), ?wr)) = v((?ul, path,, ?ur)) for all i e { 1 , 2 }, 
we can verify the correctness of ( 9 ). 

Case 4: Suppose path is of the form path]^/path2 where pathj^ and path2 are 
arbitrary PP expressions. In this case, by Definition 5 , we have: 

CBV((?vl, pathi/path 2 , ?ur) | X) = CBV(P' | X) \ {?u}, 

where P ' = ((?wl, pathj^, ?u) and (?u, path2, ?wr)) and ?u € V is an arbitrary vari¬ 
able such that Iv ^ (X U {?ul,?wr}) and ?u € CBV(P'|X). By using the fact 
that CBV((?UL,pathi/path2, ?ur) I X) = v((?ul, pathi/path2, ?ur)) (cf. ( 7 ) 
above), we obtain: 


V((?WL, pathi/path2, ?ur)) = CBV(P' | X) \ {?z;}. 
Consequently, CBV(P' | X) ^ 0. Therefore, by Definition 5, either 

1. CBV(P{ I X) = V(P{) and CBV(P^ | X) = V(P^), or 

2. CBV(P{ I X) = V(P{) and CBV(P^ | X U SBV(P{)) = V(P^), or 

3. CBV(P^ I X) = vIp^) and CBV(P;[ j X U SBV(P^)) = V(P(), 

where P[ = (?wl, path^, ?!;) and P 2 = (?u, path 2 , ?wr) (i.e., (P( andP^) = P')- 

W.l.o.g., we discuss the first of these three alternatives only (the discussion of the other 
two would be almost identical). 

Then, due to CBV(P^ | X) = V(P{), by induction we can assume that ?wl G X 
or ?u G X. However, we can rule out the latter because 7 v ^ (X U {?wl, ^vr}) (see 
above). Hence, ?wl G X. In a similar manner it is possible to also show ?ur G X by 
using CBV(P^ I X) = V(P^). □ 


C Proof of Lemma 4 

Let P = (?UL, (path)*, ?ur) and P' = (?ur, (^ath)*, ?ul) be two PP patterns 
such that path is an arbitrary PP expression and Yul and Yur are two variables (i.e., 
?ULj?'yR G V). Furthermore, let W be an arbitrary WoLD. We have to show that 
C IP'I^- (Claim 1) and [Pl^>^ D (Claim 2) hold. 

Proof of Claim 1: Let p* be an arbitrary solution mapping such that fj,* G |Pl^"'- 
W.l.o.g., we show that C |P']^'' by showing that p* G |Pl^''- To this end, 

by Definition 3, we have to show that p* satisfies the following three conditions: 


Condition 1: dom(/j,*) = {??;l, ?'yR}, 

Condition 2: /i*(?fR) € terms(Vl^), and 
Condition 3: ^*{7vi,) € ALWPl(/j,(?tiR), ^ath, W). 


On the other hand, since fj* € d* has the following three properties: 

Property 1: dom(/i*) = {?fL, ?fR}, 

Property 2: p*{lvi^) S terms(Vl^), and 
Property 3: p*{1vyC) G ALWPl(/j,(?tiL), path, 

Hence, p* satisfies Condition 1. To see that p* also satisfies Condition 2 and 3, consider 
Property 3. Due to this property, there exists a sequence of solution mappings pi,... ,pn 
and two variables lx, ly G V such that (i) dom(^i) = {lx, ly} for all i G {0,..., n}, 
(ii) pi{lx) = p*{lvi,), (in) Pn{ly) = p*ilvn), and (iv) pi G |(?a;, path, 7^)11^=^ for 
all i G {0,..., n}. Due to the latter, Pi{ly) G terms(fC) for all i G {0,..., n}. Thus, 
with pn(ly) = p*{1vyi), we have p*{1vyi) S terms(VP); i.e., p* satisfies Condition 2. 

Moreover, by Definition 3, |(?a;, path, = [(?y, ^ath, lx)\1^^ and, thus, 

Pi G ^ath, ?x)]^^ for all i G {0, ..., n}. Therefore, the sequence of solution 
mappings pi,..., pn can also be used to show that pi{lx) £ ALWPl(^„(?y), ^ath, W). 
Due to this fact and due to piilx) = p*{lvi^) and pn{ly) = p*{lv^), we can verify 
that p* satisfies Condition 3. 

Proof of Claim 2: Let p* be an arbitrary solution mapping such that p* G 
W.l.o.g., we show that ^ I-Plrv"" hy showing that p* G To this end, 

by Definition 3, we have to show that p* satisfies the following three conditions: 

Condition 1: dom(/i*) = {?Ul, ?Ur}, 

Condition 2: p*{lvi^) G terms(fL), and 
Condition 3: p*{1vb.) G ALWP1(p(?ul), path, PL). 

On the other hand, since p* G p* has the following three properties: 

Property 1: dom(/r*) = {?ul, ?ur}. 

Property 2: p*{lvji) G terms(PL), and 
Property 3: p*(lvi^) G ALWP1(p(?ur), ^ath, W). 

Due to the symmetry of these conditions and properties to the conditions and properties 
in the discussion of Claim 1, it is easily seen that Claim 2 can be proved by using an 
argument that is reverse to the argument used for proving Claim 1. □ 


